Feat/gemma4 adapters#1385
Conversation
…utes - Unwrap text_config for Gemma4ForConditionalGeneration models - Read PLE, KV sharing, layer_types, softcapping from text_cfg - Add NotImplementedError guard for MoE variants (26B-A4B) - Update tests to exercise text_config path
jlarson4
left a comment
There was a problem hiding this comment.
Hey @huseyincavusbi glad to finally see this come through. I have a couple comments that exist below, take a look when you have a moment and let me know what you think.
Additionally, @punishell has recently opened #1377, which is a parallel implementation of Gemma4. I'd like to include bits of both your implementations where it makes sense & is relevant. They came up with a very straight forward solution for the KV-cache issue that might be of use to you, if you want to try rebasing your work onto theirs as an extension point. I am thinking there may be a way to use their DelegatedAttentionBlockBridge in combination with your work spent on adding support for Gemma4 to position_embeddings_atttention to provide even better overall support.
There are more moving parts here than anticipated, if you have questions please feel free to ask.
|
|
||
| import pytest | ||
|
|
||
| from transformer_lens.config.TransformerBridgeConfig import TransformerBridgeConfig |
There was a problem hiding this comment.
Since you began this PR, the structure of the TrasnformerBridgeConfig import path was adjusted due to a name conflict introduced in an related change refactor. Please update this to
from transformer_lens.config import TransformerBridgeConfig
| # with a specific transformers version). Set self.cfg.use_native_generate = True | ||
| # in the adapter's __init__. | ||
| if getattr(self.cfg, "use_native_generate", False): | ||
| return self.hf_generate( |
There was a problem hiding this comment.
This delegation is dropping potential kwargs that a user may pass in. stop_at_eos, prepend_bos, padding_side, freq_penalty, use_past_kv_cache, as well as the new stop_strings/stopping_criteria add in #1374 to name a few. Someone using Gemma4 who calls calling generate(..., stop_strings=".") would have it silently ignored.
If you end up opting to keep use_native_generate, we will need to make sure all relevant kwargs are properly passed thorough
| return self.hf_generate(input, **hf_kwargs) | ||
|
|
||
| # Adapters can opt-in to delegating generation to HF's native generate() | ||
| # (e.g. when the bridge's custom attention has a KV-cache incompatibility |
There was a problem hiding this comment.
Does no-cache hooked generation (use_past_kv_cache=False) work for Gemma4 with this incompatibility? If so, that's a better stopgap than delegating to hf_generate. It preserves hooks and lets you drop the use_native_generate flag. If you could dig into that and let me know what you find, I'd appreciate it.
Description
This PR adds
TransformerBridgesupport for the Gemma 4 model family (E2B,E4B,26B-A4B, and31B) through a single unifiedGemma4ArchitectureAdapter.Key Implementation Details
gemma4.py): Dynamically handles all 4 variants by evaluating initialization configuration flags:enable_moe_block=True(specifically for the26Bvariant).num_kv_shared_layers > 0(forE2B/E4B).hidden_size_per_layer_input > 0.position_embeddings_attention.py: Applies V norm post-reshape (Gemma 4 is the first architecture featuring per-head value normalization). Handles KV-sharing delegation to Hugging Face's original attention implementation when K/V submodules are omitted. Caches computed KV states inshared_kv_statespost-RoPE for structural layer reuse.bridge.py: Introduces ause_native_generateopt-in flag. This bypasses a current Hugging Facetransformersdev-version issue where eager attention causes a KV-cache dimension mismatch during generation. Setting this flag (scoped strictly to this adapter) delegates processing to HF's nativegenerate()utilizing SDPA.main_benchmark.py: Fixes pad_token_id assignment when eos_token_id is a list (Gemma4 uses [1, 106]), taking the first element.Verification & Performance
All models have been validated.
Fixes #1297
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist: